欢迎关注“小丫画图”公众号,同名知识星球等你加入
小丫微信: epigenomics E-mail: figureya@126.com
作者:Haitao Wang
Dr. Haitao Wang: JNU,Guagnzhou => GIBH-CAS,Guagnzhou => BNU,Beijing => IMCB,Singapore => UMAC,Macao => NCCS,Singapore
Email: ht.wang@yahoo.com; wang.haitao@nccs.com.sg
小丫注释、校验
我要DIY出paper里的这种对比多个subtype的composite copy number profiles:
出自https://www.nature.com/articles/nature20805
GISTIC自带画图功能(见结尾“附二”),能画出类似于C这种图。从firehose能下载到TCGA各癌症的Segmented copy number profiles(类似于B)和Genomic positions of amplified regions(类似于C),但只有全部sample的图,我要对比不同subtype:
出自https://www.sciencedirect.com/science/article/pii/S2352396418302986?via%3Dihub
此处提供的DIY画图方法更灵活,可以画gistic score,还可以画percentage/frequency。
可单独画一个sample,也可以对比多组subtypes。
使用国内镜像安装包
options("repos"= c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
#其他物种到这里找包的名字http://bioconductor.org/packages/3.8/data/annotation/
加载包
library(BSgenome.Hsapiens.UCSC.hg19)
## Loading required package: BSgenome
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind,
## colMeans, colnames, colSums, dirname, do.call, duplicated,
## eval, evalq, Filter, Find, get, grep, grepl, intersect,
## is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
## paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
## Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
## table, tapply, union, unique, unsplit, which, which.max,
## which.min
## Loading required package: S4Vectors
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
##
## expand.grid
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Warning: package 'GenomeInfoDb' was built under R version 3.5.2
## Loading required package: GenomicRanges
## Loading required package: Biostrings
## Warning: package 'Biostrings' was built under R version 3.5.2
## Loading required package: XVector
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
## Loading required package: rtracklayer
## Warning: package 'rtracklayer' was built under R version 3.5.2
Sys.setenv(LANGUAGE = "en") #显示英文报错信息
options(stringsAsFactors = FALSE) #禁止chr转成factor
画copy number profile需要gistic score和染色体信息,其中gistic score可以用GISTIC 2.0计算(输入segment file)。
如果你已经算出gistic scores,保存为“easy_input_*scores.gistic.txt”,就可以跳过这步,直接进入“准备染色体信息”。
下载全部样本的segment file,然后按subtype分开,用GISTIC 2.0计算gistic scores,然后用算出的gistic scores来画图。
focal_input.seg.txt:segment file。从firehose下载hg19版本的TCGA 数据,例文用 ESCA,压缩包链接:http://gdac.broadinstitute.org/runs/analyses__2016_01_28/data/ESCA/20160128/gdac.broadinstitute.org_ESCA-TP.CopyNumber_Gistic2.Level_4.2016012800.0.0.tar.gz。解压缩,点击nozzle.html,查看其他结果图表和Methods。需要里面的focal_input.seg.txt和scores.gistic(后面直接用它画全部样本的图)两个文件。
hg38版本(TCGA数据)segment的获取方法见“附一”。
### segment information
tumor.seg.cnv <- read.table("focal_input.seg.txt", sep="\t", header=T, stringsAsFactors=F)
#tail(tumor.seg.cnv)
tumor.seg.cnv$Sample <- gsub("-[0-9A-Z]*-[0-9A-Z]*-[0-9A-Z]*-01$", "", tumor.seg.cnv$Sample)
#tail(tumor.seg.cnv)
从TCGA获得亚型的sample ID,然后把focal_input.seg.txt拆分成亚型的segment file
此处模仿例文,根据Histological.Type…Oesophagus和Gastric.classification分为四种subtype。实际操作时可根据每种癌症的具体情况选择合适的列来分亚型做对比。
# Get subtypes information
library(TCGAbiolinks)
## Warning: package 'TCGAbiolinks' was built under R version 3.5.2
dataSub <- data.frame(TCGAquery_subtype(tumor = "ESCA"))
## esca subtype information from:doi:10.1038/nature20805
# dataSub跟例文的Supplementary Table 1是同一个文件
table(dataSub$Histological.Type...Oesophagus)
##
## EAC ESCC
## 72 90
table(dataSub$Tissue.level.Location)
##
## Gastric Oesophageal
## 359 164
EAC_id <- dataSub[dataSub$Histological.Type...Oesophagus=="EAC", "patient"]
ESCC_id <- dataSub[dataSub$Histological.Type...Oesophagus=="ESCC", "patient"]
GA <- dataSub[dataSub$Tissue.level.Location=="Gastric", ]
table(GA$Gastric.classification)
##
## CIN EBV GS MSI
## 188 30 66 75
GA_CIN_id <- dataSub[dataSub$Gastric.classification=="CIN", "patient"]
GA_nonCIN_id <- dataSub[dataSub$Gastric.classification!="CIN", "patient"]
# Get subtype segments information
EAC.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% EAC_id,]
ESCC.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% ESCC_id,]
GA_CIN.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% GA_CIN_id,]
GA_nonCIN.seg.cnv <- tumor.seg.cnv[tumor.seg.cnv$Sample %in% GA_nonCIN_id,]
write.table(EAC.seg.cnv, file="tumor.EAC.seg.txt", sep="\t", row.names=F, quote = F)
write.table(ESCC.seg.cnv, file="tumor.ESCC.seg.txt", sep="\t", row.names=F, quote = F)
write.table(GA_CIN.seg.cnv, file="tumor.GACIN.seg.txt", sep="\t", row.names=F, quote = F)
write.table(GA_nonCIN.seg.cnv, file="tumor.GAnonCIN.seg.txt", sep="\t", row.names=F, quote = F)
注:segment file作为GISTIC 2.0的输入,用GISTIC 2.0计算gistic score的方法见“附二”。
计算将获得easy_input_*.scores.gistic.txt文件:
# Create a chromosomes reference objects function
chrom_extract <- function(BSgenome.hg = NULL) {
if (is.null(BSgenome.hg )) stop("NULL object !", call. = FALSE)
obj <- list(species = GenomeInfoDb::organism(BSgenome.hg), genomebuild = BSgenome::providerVersion(BSgenome.hg))
df <- data.frame(chrom = BSgenome::seqnames(BSgenome.hg), chrN = seq_along(BSgenome::seqnames(BSgenome.hg)), chr.length = GenomeInfoDb::seqlengths(BSgenome.hg), stringsAsFactors = FALSE)
df <- df[1:24,]
df$chr.length.sum <- cumsum(as.numeric(df$chr.length))
df$chr.length.cumsum <- c(0, df$chr.length.sum[-nrow(df)])
df$middle.chr <- round(diff(c(0, df$chr.length.sum)) /2)
df$middle.chr.genome <- df$middle.chr + df$chr.length.cumsum
obj$chromosomes <- df
obj$chrom2chr <- sapply(obj$chromosomes$chrom, function(k) { obj$chromosomes$chrN[obj$chromosomes$chrom == k]}, simplify = FALSE)
obj$chr2chrom <- sapply(obj$chromosomes$chrN, function(k) { obj$chromosomes$chrom[obj$chromosomes$chrN == k]}, simplify = FALSE)
names(obj$chr2chrom) <- obj$chromosomes$chrN
obj$genome.length <- sum(as.numeric(obj$chromosomes$chr.length), na.rm = TRUE)
return(obj)
}
# Extract a chromosomes reference loci
BSgenome.hg = "BSgenome.Hsapiens.UCSC.hg19"
BSg.obj <- getExportedValue(BSgenome.hg, BSgenome.hg)
genome.version <- BSgenome::providerVersion(BSg.obj)
chrom <- chrom_extract(BSg.obj)
#str(chrom)
分别画全部样本的gistic score和percentage/frequency,再画四个subtye对比的gistic score和percentage/frequency。
#pdf("ESCA_copy_number_gistic_score.pdf",12,5)
# Import gistic2 results read gistic output file
scores <- read.table("easy_input_scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
head(scores)
## Type Chromosome Start End X.log10.q.value. G.score
## 1 Amp 1 3218610 3229258 0 0.134372
## 2 Amp 1 3235468 3279324 0 0.141093
## 3 Amp 1 3280814 3522124 0 0.128763
## 4 Amp 1 3535644 3565825 0 0.136513
## 5 Amp 1 3575833 3614480 0 0.128763
## 6 Amp 1 3614576 3654595 0 0.136279
## average.amplitude frequency
## 1 0.419716 0.206522
## 2 0.422953 0.211957
## 3 0.419967 0.201087
## 4 0.433913 0.201087
## 5 0.417425 0.201087
## 6 0.432290 0.201087
unique(scores$Chromosome)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#把染色体名从阿拉伯数字改为“chr1”、“chrX”的形式
scores[scores$Chromosome==23, "Chromosome"] <- "X"
scores[scores$Chromosome==24, "Chromosome"] <- "Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
# Important step for accurate length to match back to continual chrom loci
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score) - 0.1, max(scores$G.score) + 0.1)
title <- paste0("TCGA ESCA overall copy number gistic score", " ", "n=", length(dataSub$patient))
plot(scores.amp$Start.geno, scores.amp$G.score,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i",
xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.15,ylim[2]-0.15)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
#dev.off()
#pdf("ESCA_copy_number_percentage.pdf",12,15)
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency<- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency<- scores.del$frequency * -100
# copy number percentage plot
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim<- c(-90,90)
title=paste0("ESCA overall copy number percentage"," ","n=",length(dataSub$patient))
plot(scores.amp$Start.geno, scores.amp$frequency,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i",
xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "gain/loss percentage in cohort", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .5))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(-80,80)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
pdf("cnv.scores.gistic.pdf",15,12)
par(mfrow=c(4,1), mar = par()$mar + c(3,0,0,3))
### ESCC ###
scores <- read.table("easy_input_ESCC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("ESCC copy number gistic score"," ","n=",length(ESCC_id))
plot(scores.amp$Start.geno, scores.amp$G.score,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.05,ylim[2]-0.05)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
### EAC ###
scores <- read.table("easy_input_EAC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -100
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("EAC copy number gistic score"," ","n=",length(EAC_id))
plot(scores.amp$Start.geno, scores.amp$G.score,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
### GA_CIN ###
scores <- read.table("easy_input_GA_CIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("GA_CIN copy number gistic score"," ","n=",length(GA_CIN_id))
plot(scores.amp$Start.geno, scores.amp$G.score,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.05,ylim[2]-0.05)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
### GA_nonCIN ###
scores <- read.table("easy_input_GA_nonCIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$G.score <- scores.amp$G.score * 1
scores.del <- scores[scores$Type=="Del",]
scores.del$G.score <- scores.del$G.score * -1
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$G.score)-0.1,max(scores$G.score)+0.1)
title=paste0("GA_nonCIN copy number gistic score"," ","n=",length(GA_nonCIN_id))
plot(scores.amp$Start.geno, scores.amp$G.score,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "gistic score", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$G.score, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+0.05,ylim[2]-0.05)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
dev.off()
## quartz_off_screen
## 2
pdf("cnv.frequence.pdf",15,12)
par(mfrow=c(4,1), mar = par()$mar + c(3,0,0,3))
### ESCC ###
scores <- read.table("easy_input_ESCC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("ESCC, n=",length(ESCC_id))
plot(scores.amp$Start.geno, scores.amp$frequency,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
### EAC ###
scores <- read.table("easy_input_EAC.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("EAC, n=",length(EAC_id))
plot(scores.amp$Start.geno, scores.amp$frequency,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
### GA_CIN ###
scores <- read.table("easy_input_GA_CIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("GA_CIN, n=",length(GA_CIN_id))
plot(scores.amp$Start.geno, scores.amp$frequency,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
### GA_nonCIN ###
scores <- read.table("easy_input_GA_nonCIN.scores.gistic.txt", sep="\t",header=T,stringsAsFactors = F)
# Important step for accurate length to match back to continual chrom loci
scores[scores$Chromosome==23,"Chromosome"]="X"
scores[scores$Chromosome==24,"Chromosome"]="Y"
chrID <- unname(unlist(chrom$chrom2chr[as.character(paste0("chr",scores$Chromosome))]))
scores$Start.geno <- scores$Start + chrom$chromosomes$chr.length.cumsum[chrID]
scores$End.geno <- scores$End + chrom$chromosomes$chr.length.cumsum[chrID]
# Prepare input data for ploting
scores.amp <- scores[scores$Type=="Amp",]
scores.amp$frequency <- scores.amp$frequency * 100
scores.del <- scores[scores$Type=="Del",]
scores.del$frequency <- scores.del$frequency * -100
scores <- rbind.data.frame(scores.amp,scores.del)
# seg.col = list(gain = "red", outscale.gain = "darkred", loss = "blue", outscale.red = "midnightblue")
ylim <- c(min(scores$frequency)-0.1,max(scores$frequency)+0.1)
title=paste0("GA_nonCIN, n=",length(GA_nonCIN_id))
plot(scores.amp$Start.geno, scores.amp$frequency,
pch = ".", type='h',cex = 2, xaxs = "i", yaxs = "i", xlim = c(0,chrom$genome.length), ylim = ylim,
main = title, cex.main = 2, ylab = "Frequency", xlab = NA,
cex.lab = 2, col = adjustcolor("darkred", alpha.f = .8), xaxt = "n", lwd = 2, las=1) # las=1 rotating axis labels in R
lines(scores.del$Start.geno, scores.del$frequency, type='h', lwd = 2, col = adjustcolor("midnightblue", alpha.f = .8))
ink <- chrom$chromosomes$chrN %in% chrID
yrange = abs(diff(ylim))
m.pos <- c(ylim[1]+10,ylim[2]-10)
m.mod <- -(chrom$chromosomes$chrN[ink] %% 2) +2
try(text(x = chrom$chromosomes$middle.chr.geno[ink], y = m.pos[m.mod], labels = chrom$chromosomes$chrom[ink], cex = 1))
abline(h = 0.0, col = 1, lwd = 1, lty = 3)
abline(v = c(0,chrom$chromosomes$chr.length.sum), col = 1, lty = 3, lwd = 1)
col1 <- adjustcolor("darkred", alpha.f = .8)
col2 <- adjustcolor("midnightblue", alpha.f = .8)
# The position of the legend can be specified also using the following keywords : "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "topright", "right" and "center".
legend("topleft", c("gain","loss"), cex=0.6, bty="n", fill=c(col1,col2))
dev.off()
## quartz_off_screen
## 2
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-ESCA",
data.category = "Copy Number Variation",
data.type = "Copy Number Segment",
sample.type = c("Primary solid Tumor","Solid Tissue Normal"))
GDCdownload(query,method = "api")
tumor.cnv <- GDCprepare(query = query, save = TRUE, save.filename = "tumorCNV.rda")
To generate discrete copy number data file you may need to run GISTIC 2.0. GISTIC 2.0 can be [installed]http://www.broadinstitute.org/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=216&p=t or run online using the GISTIC 2.0 module on [GenePattern]https://cloud.genepattern.org/. Running GISTIC 2.0 requires two input files:
出自:https://cbioportal.readthedocs.io/en/latest/Data-Loading-Tips-and-Best-Practices.html
source /etc/profile
source /etc/profile.d/modules.sh
module add impi/5.1.3
module add intel/16.0.3
module add java/1.8.0_91
#mpirun -np 48 /share/apps/vasp/5.4.1/intel/16.0.2/bin/vasp_std
echo --- creating output directory ---
basedir=`pwd`/esca.basal.seg #esca.luma.seg
mkdir -p $basedir
echo --- running GISTIC ---
## input file definitions
segfile=`pwd`/esca.basal.seg.cnv.txt #esca.luma.seg.cnv.txt
refgenefile=`pwd`/refgenefiles/hg19.UCSC.add_miR.140312.refgene.mat
## call script that sets MCR environment and calls GISTIC executable
./gistic2 -b $basedir -seg $segfile -refgene $refgenefile -genegistic 1 -smallmem 1 -broad 1 -brlen 0.5 -conf 0.95 -armpeel 1 -savegene 1 -gcm extreme
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.14.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] TCGAbiolinks_2.10.4 BSgenome.Hsapiens.UCSC.hg19_1.4.0
## [3] BSgenome_1.50.0 rtracklayer_1.42.2
## [5] Biostrings_2.50.2 XVector_0.22.0
## [7] GenomicRanges_1.34.0 GenomeInfoDb_1.18.2
## [9] IRanges_2.16.0 S4Vectors_0.20.1
## [11] BiocGenerics_0.28.0
##
## loaded via a namespace (and not attached):
## [1] backports_1.1.3 circlize_0.4.5
## [3] AnnotationHub_2.14.4 aroma.light_3.12.0
## [5] plyr_1.8.4 selectr_0.4-1
## [7] ConsensusClusterPlus_1.46.0 lazyeval_0.2.1
## [9] splines_3.5.1 BiocParallel_1.16.6
## [11] ggplot2_3.1.0 sva_3.30.1
## [13] digest_0.6.18 foreach_1.4.4
## [15] htmltools_0.3.6 magrittr_1.5
## [17] memoise_1.1.0 cluster_2.0.7-1
## [19] doParallel_1.0.14 limma_3.38.3
## [21] ComplexHeatmap_1.20.0 readr_1.3.1
## [23] annotate_1.60.1 matrixStats_0.54.0
## [25] sesameData_1.0.0 R.utils_2.8.0
## [27] prettyunits_1.0.2 colorspace_1.4-0
## [29] blob_1.1.1 rvest_0.3.2
## [31] ggrepel_0.8.0 xfun_0.5
## [33] dplyr_0.8.0.1 crayon_1.3.4
## [35] RCurl_1.95-4.12 jsonlite_1.6
## [37] genefilter_1.64.0 zoo_1.8-4
## [39] survival_2.43-3 iterators_1.0.10
## [41] glue_1.3.0 survminer_0.4.3
## [43] gtable_0.2.0 sesame_1.0.0
## [45] zlibbioc_1.28.0 GetoptLong_0.1.7
## [47] DelayedArray_0.8.0 wheatmap_0.1.0
## [49] shape_1.4.4 scales_1.0.0
## [51] DESeq_1.34.1 DBI_1.0.0
## [53] edgeR_3.24.3 ggthemes_4.1.0
## [55] Rcpp_1.0.0 cmprsk_2.2-7
## [57] xtable_1.8-3 progress_1.2.0
## [59] bit_1.1-14 matlab_1.0.2
## [61] km.ci_0.5-2 preprocessCore_1.44.0
## [63] httr_1.4.0 RColorBrewer_1.1-2
## [65] pkgconfig_2.0.2 XML_3.98-1.19
## [67] R.methodsS3_1.7.1 locfit_1.5-9.1
## [69] DNAcopy_1.56.0 tidyselect_0.2.5
## [71] rlang_0.3.1 later_0.8.0
## [73] AnnotationDbi_1.44.0 munsell_0.5.0
## [75] tools_3.5.1 downloader_0.4
## [77] generics_0.0.2 RSQLite_2.1.1
## [79] ExperimentHub_1.8.0 broom_0.5.1
## [81] evaluate_0.13 stringr_1.4.0
## [83] yaml_2.2.0 knitr_1.22
## [85] bit64_0.9-7 survMisc_0.5.5
## [87] purrr_0.3.1 randomForest_4.6-14
## [89] EDASeq_2.16.3 nlme_3.1-137
## [91] mime_0.6 R.oo_1.22.0
## [93] xml2_1.2.0 biomaRt_2.38.0
## [95] compiler_3.5.1 curl_3.3
## [97] interactiveDisplayBase_1.20.0 tibble_2.0.1
## [99] geneplotter_1.60.0 stringi_1.3.1
## [101] GenomicFeatures_1.34.4 lattice_0.20-38
## [103] Matrix_1.2-15 KMsurv_0.1-5
## [105] pillar_1.3.1 BiocManager_1.30.4
## [107] GlobalOptions_0.1.0 data.table_1.12.0
## [109] bitops_1.0-6 httpuv_1.4.5.1
## [111] R6_2.4.0 latticeExtra_0.6-28
## [113] hwriter_1.3.2 promises_1.0.1
## [115] ShortRead_1.40.0 gridExtra_2.3
## [117] codetools_0.2-16 assertthat_0.2.0
## [119] SummarizedExperiment_1.12.0 rjson_0.2.20
## [121] GenomicAlignments_1.18.1 Rsamtools_1.34.1
## [123] GenomeInfoDbData_1.2.0 mgcv_1.8-27
## [125] hms_0.4.2 grid_3.5.1
## [127] tidyr_0.8.3 rmarkdown_1.11
## [129] ggpubr_0.2 Biobase_2.42.0
## [131] shiny_1.2.0